Duplicate Image Detection in Large Scale Databases
نویسندگان
چکیده
We propose an image duplicate detection method for identifying modified copies of the same image in a very large database. Modifications that we consider include rotation, scaling and cropping. A compact 12 dimensional descriptor based on Fourier Mellin Transform is introduced. The compactness of this descriptor allows efficient indexing over the entire database. Results are presented on a 10 million image database that demonstrates the effectiveness and the efficiency of this descriptor. In addition, we also propose extension to arbitrary shape representations and similar scene detection and preliminary results are also included.
منابع مشابه
Speed-up Multi-modal Near Duplicate Image Detection
Near-duplicate image detection is a necessary operation to refine image search results for efficient user exploration. The existences of large amounts of near duplicates require fast and accurate automatic near-duplicate detection methods. We have designed a coarse-to-fine near duplicate detection framework to speed-up the process and a multi-modal integration scheme for accurate detection. The...
متن کاملNear Duplicate Image Detection: min-Hash and tf-idf Weighting
This paper proposes two novel image similarity measures for fast indexing via locality sensitive hashing. The similarity measures are applied and evaluated in the context of near duplicate image detection. The proposed method uses a visual vocabulary of vector quantized local feature descriptors (SIFT) and for retrieval exploits enhanced min-Hash techniques. Standard min-Hash uses an approximat...
متن کاملAn image signature for any kind of image
We describe an algorithm for computing an image signature, suitable for first-stage screening for duplicate images. Our signature relies on relative brightness of image regions, and is generally applicable to photographs, text documents, and line art. We give experimental results on the sensitivity and robustness of signatures for actual image collections, and also results on the robustness of ...
متن کاملTA-DRD: A Three-step Automatic Duplicate Record Detection
Duplicate record detection is a key step in Deep Web data integration, but the existing approaches do not adapt to its large-scale nature. In this paper, a three-step automatic approach is proposed for duplicate record detection in Deep Web. It firstly uses cluster ensemble to select initial training instance. Then it utilizes tri-training classification to construct classification model. Final...
متن کاملFast Convex Layers Algorithm for Near-Duplicate Image Detection
This paper builds on a novel, fast algorithm for generating the convex layers on grid points with linear time complexity. Convex layers are extracted from the binary image. The obtained convex hulls are characterized by the number of their vertices and used as representative image features. A computational geometric approach to near-duplicate image detection stems from these features. Similarit...
متن کامل